Goto

Collaborating Authors

 reinforcement-learning agent


Reinforcement Learning on Computational Resource Allocation of Cloud-based Wireless Networks

arXiv.org Artificial Intelligence

Wireless networks used for Internet of Things (IoT) are expected to largely involve cloud-based computing and processing. Softwarised and centralised signal processing and network switching in the cloud enables flexible network control and management. In a cloud environment, dynamic computational resource allocation is essential to save energy while maintaining the performance of the processes. The stochastic features of the Central Processing Unit (CPU) load variation as well as the possible complex parallelisation situations of the cloud processes makes the dynamic resource allocation an interesting research challenge. This paper models this dynamic computational resource allocation problem into a Markov Decision Process (MDP) and designs a model-based reinforcement-learning agent to optimise the dynamic resource allocation of the CPU usage. Value iteration method is used for the reinforcement-learning agent to pick up the optimal policy during the MDP. To evaluate our performance we analyse two types of processes that can be used in the cloud-based IoT networks with different levels of parallelisation capabilities, i.e., Software-Defined Radio (SDR) and Software-Defined Networking (SDN). The results show that our agent rapidly converges to the optimal policy, stably performs in different parameter settings, outperforms or at least equally performs compared to a baseline algorithm in energy savings for different scenarios.


An Ensemble of Linearly Combined Reinforcement-Learning Agents

AAAI Conferences

Reinforcement-learning (RL) algorithms are often tweaked and tunedto specific environments when applied, calling into question whetherlearning can truly be considered autonomous in these cases. In thiswork, we show how more robust learning across environments is possibleby adopting an ensemble approach to reinforcement learning. Our approachlearns a weighted linear combination of Q-values from multiple independentlearning algorithms. In our evaluations in generalized RL environments,we find that the algorithm compares favorably to the best tuned algorithm.Our work provides a promising basis for further study into the useof ensemble methods in RL.